Building and Documenting Workflows with Python-Based Snakemake

نویسندگان

  • Johannes Köster
  • Sven Rahmann
چکیده

Snakemake is a novel workflow engine with a simple Python-derived workflow definition language and an optimizing execution environment. It is the first system that supports multiple named wildcards (or variables) in input and output filenames of each rule definition. It also allows to write human-readable workflows that document themselves. We have found Snakemake especially useful for building high-throughput sequencing data analysis pipelines and present examples from this area. Snakemake exemplifies a generic way to implement a domain specific language in python, without writing a full parser or introducing syntactical overhead by overloading language features. 1998 ACM Subject Classification D.3.2 Language Classifications, D.3.4 Processors, J.3 Life and Medical Sciences

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Snakemake - a scalable bioinformatics workflow engine

SUMMARY Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow. It is the first system to support the use of automatically inferred multiple named wildcards (or variables) in input and output filenames. AVAILABILITY http:/...

متن کامل

Sequanix: A Dynamic Graphical Interface for Snakemake Workflows.

Summary We designed a PyQt graphical user interface - Sequanix - aimed at democratizing the use of Snakemake pipelines in the NGS space and beyond. By default, Sequanix includes Sequana NGS pipelines (Snakemake format) (http://sequana.readthedocs.io), and is also capable of loading any external Snakemake pipeline. New users can easily, visually, edit configuration files of expert-validated pipe...

متن کامل

ATLAS (Automatic Tool for Local Assembly Structures) - a comprehensive infrastructure for assembly, annotation, and genomic binning of metagenomic and metatranscriptomic data

Summary: ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multi-omics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequ...

متن کامل

Enabling Cross-Domain Collaboration in Molecular Dynamics Workflows

Molecular dynamics (MD) simulation is used increasingly often for materials design to reduce the costs associated with the pure experimental approach. Complex MD simulations are, however, notoriously hard to set up. It requires expertise in several distinct areas, including the peculiarities of a particular simulator tool, the chemical properties of the family of materials being studied, as wel...

متن کامل

Processing: A Python Framework for the Seamless Integration of Geoprocessing Tools in QGIS

Processing is an object-oriented Python framework for the popular open source Geographic Information System QGIS, which provides a seamless integration of geoprocessing tools from a variety of different software libraries. In this paper, we present the development history, software architecture and features of the Processing framework, which make it a versatile tool for the development of geopr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012